Picture for Tieniu Tan

Tieniu Tan

RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents

Add code
Feb 03, 2026
Viaarxiv icon

How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing

Add code
Feb 02, 2026
Viaarxiv icon

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Add code
Jan 27, 2026
Viaarxiv icon

Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing

Add code
Jun 11, 2025
Figure 1 for Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
Figure 2 for Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
Figure 3 for Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
Figure 4 for Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
Viaarxiv icon

VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks

Add code
Jun 10, 2025
Figure 1 for VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks
Figure 2 for VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks
Figure 3 for VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks
Figure 4 for VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks
Viaarxiv icon

BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

Add code
Jun 09, 2025
Viaarxiv icon

REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing

Add code
May 25, 2025
Figure 1 for REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing
Figure 2 for REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing
Figure 3 for REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing
Figure 4 for REACT: Representation Extraction And Controllable Tuning to Overcome Overfitting in LLM Knowledge Editing
Viaarxiv icon

Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory

Add code
May 16, 2025
Figure 1 for Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
Figure 2 for Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
Figure 3 for Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
Figure 4 for Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory
Viaarxiv icon

A Call for New Recipes to Enhance Spatial Reasoning in MLLMs

Add code
Apr 21, 2025
Viaarxiv icon

MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models

Add code
Apr 07, 2025
Viaarxiv icon